17 research outputs found

    Absolute Convergence of Rational Series is Semi-decidable

    No full text
    International audienceWe study \emph{real-valued absolutely convergent rational series}, i.e. functions r:ΣRr: \Sigma^* \rightarrow {\mathbb R}, defined over a free monoid Σ\Sigma^*, that can be computed by a multiplicity automaton AA and such that wΣr(w)<\sum_{w\in \Sigma^*}|r(w)|<\infty. We prove that any absolutely convergent rational series rr can be computed by a multiplicity automaton AA which has the property that rAr_{|A|} is simply convergent, where rAr_{|A|} is the series computed by the automaton A|A| derived from AA by taking the absolute values of all its parameters. Then, we prove that the set Arat(Σ){\cal A}^{rat}(\Sigma) composed of all absolutely convergent rational series is semi-decidable and we show that the sum wΣr(w)\sum_{w\in \Sigma^*}|r(w)| can be estimated to any accuracy rate for any rArat(Σ)r\in {\cal A}^{rat}(\Sigma). We also introduce a spectral radius-like parameter ρr\rho_{|r|} which satisfies the following property: rr is absolutely convergent iff ρr<1\rho_{|r|}<1

    Impact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots

    Get PDF
    International audiencePredicting the folding of an RNA sequence, while allowing general pseudoknots (PK), consists in finding a minimal free-energy matching of its nn positions. Assuming independently contributing base-pairs, the problem can be solved in Θ(n3)\Theta(n^3)-time using a variant of the maximal weighted matching. By contrast, the problem was previously proven NP-Hard in the more realistic nearest-neighbor energy model. In this work, we consider an intermediate model, called the stacking-pairs energy model. We extend a result by Lyngs\o, showing that RNA folding with PK is NP-Hard within a large class of parametrization for the model. We also show the approximability of the problem, by giving a practical Θ(n3)\Theta(n^3) algorithm that achieves at least a 55-approximation for any parametrization of the stacking model. This contrasts nicely with the nearest-neighbor version of the problem, which we prove cannot be approximated within any positive ratio, unless P=NPP=NP.La prédiction du repliement, avec pseudonoeuds généraux, d'une séquence d'ARN de taille nn est équivalent à la recherche d'un couplage d'énergie libre minimale. Dans un modèle d'énergie simple, où chaque paire de base contribue indépendamment à l'énergie, ce problème peut être résolu en temps Θ(n3)\Theta(n^3) grâce à une variante d'un algorithme de couplage pondéré maximal. Cependant, le même problème a été démontré NP-difficile dans le modèle d'énergie dit des plus proches voisins. Dans ce travail, nous étudions les propriétés du problème sous un modèle d'empilements, constituant un modèle intermédiaire entre ceux d'appariement et des plus proches voisins. Nous démontrons tout d'abord que le repliement avec pseudo-noeuds de l'ARN reste NP-difficile dans de nombreuses valuations du modèle d'énergie. . Par ailleurs, nous montrons que ce problème est approximable, en proposant un algorithme polynomial garantissant une 1/51/5-approximation. Ce résultat illustre une différence essentielle entre ce modèle et celui des plus proches voisins, pour lequel nous montrons qu'il ne peut être approché à aucun ratio positif par un algorithme en temps polynomial sauf si N=NPN=NP

    Inapproximability of maximal strip recovery

    Get PDF
    In comparative genomic, the first step of sequence analysis is usually to decompose two or more genomes into syntenic blocks that are segments of homologous chromosomes. For the reliable recovery of syntenic blocks, noise and ambiguities in the genomic maps need to be removed first. Maximal Strip Recovery (MSR) is an optimization problem proposed by Zheng, Zhu, and Sankoff for reliably recovering syntenic blocks from genomic maps in the midst of noise and ambiguities. Given dd genomic maps as sequences of gene markers, the objective of \msr{d} is to find dd subsequences, one subsequence of each genomic map, such that the total length of syntenic blocks in these subsequences is maximized. For any constant d2d \ge 2, a polynomial-time 2d-approximation for \msr{d} was previously known. In this paper, we show that for any d2d \ge 2, \msr{d} is APX-hard, even for the most basic version of the problem in which all gene markers are distinct and appear in positive orientation in each genomic map. Moreover, we provide the first explicit lower bounds on approximating \msr{d} for all d2d \ge 2. In particular, we show that \msr{d} is NP-hard to approximate within Ω(d/logd)\Omega(d/\log d). From the other direction, we show that the previous 2d-approximation for \msr{d} can be optimized into a polynomial-time algorithm even if dd is not a constant but is part of the input. We then extend our inapproximability results to several related problems including \cmsr{d}, \gapmsr{\delta}{d}, and \gapcmsr{\delta}{d}.Comment: A preliminary version of this paper appeared in two parts in the Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC 2009) and the Proceedings of the 4th International Frontiers of Algorithmics Workshop (FAW 2010

    A Combinatorial Framework for Designing (Pseudoknotted) RNA Algorithms

    Get PDF
    We extend an hypergraph representation, introduced by Finkelstein and Roytberg, to unify dynamic programming algorithms in the context of RNA folding with pseudoknots. Classic applications of RNA dynamic programming energy minimization, partition function, base-pair probabilities...) are reformulated within this framework, giving rise to very simple algorithms. This reformulation allows one to conceptually detach the conformation space/energy model -- captured by the hypergraph model -- from the specific application, assuming unambiguity of the decomposition. To ensure the latter property, we propose a new combinatorial methodology based on generating functions. We extend the set of generic applications by proposing an exact algorithm for extracting generalized moments in weighted distribution, generalizing a prior contribution by Miklos and al. Finally, we illustrate our full-fledged programme on three exemplary conformation spaces (secondary structures, Akutsu's simple type pseudoknots and kissing hairpins). This readily gives sets of algorithms that are either novel or have complexity comparable to classic implementations for minimization and Boltzmann ensemble applications of dynamic programming

    Metrics and similarity measures for hidden markow models

    No full text
    Hidden Markov models were introduced in the beginning of the 1970&apos;s as a tool in speech recognition. During the last decade they have been found useful in addressing problems in computational biology such as characterising sequence families, gene nding, structure prediction and phylogenetic analysis. In this paper we propose several measures between hidden Markov models. We give an ecient algorithm that computes the measures for left-right models, e.g. prole hidden Markov models, and briey discuss how to extend the algorithm to other types of models. We present an experiment using the measures to compare hidden Markov models for three classes of signal peptides. Introduction A hidden Markov model describes a probability distribution over a potentially innite set of sequences. It is convenient to think of a hidden Markov model as generating a sequence according to some probability distribution by following a rst order Markov chain of states, called the path, from a sp..

    Approximating the 2-Interval Pattern problem

    Get PDF
    We address the problem of approximating the 2-Interval Pattern problem over its various models and restrictions. This problem, which is motivated by RNA secondary structure prediction, asks to find a maximum cardinality subset of a 2-interval set with respect to some prespecified model. For each such model, we give varying approximation quality depending on the different possible restrictions imposed on the input 2-interval set

    Predicting RNA Secondary Structures: One-grammar-fits-all Solution

    No full text
    LNCS v. 9096 entitled: Bioinformatics Research and Applications: 11th International Symposium, ISBRA 2015 Norfolk, USA, June 7-10, 2015 ProceedingsRNA secondary structures are known to be important in many biological processes. Many available programs have been developed for RNA secondary structure prediction. Based on our knowledge, however, there still exist secondary structures of known RNA sequences which cannot be covered by these algorithms. In this paper, we provide an efficient algorithm that can handle all RNA secondary structures found in Rfam database. We designed a new stochastic context-free grammar named Rectangle Tree Grammar (RTG) which significantly expands the classes of structures that can be modelled. Our algorithm runs in O(n 6) time and the accuracy is reasonably high, with average PPV and sensitivity over 75%. In addition, the structures that RTG predicts are very similar to the real ones

    Learning Stochastic Finite Automata

    No full text
    Abstract. Stochastic deterministic finite automata have been introduced and are used in a variety of settings. We report here a number of results concerning the learnability of these finite state machines. In the setting of identification in the limit with probability one, we prove that stochastic deterministic finite automata cannot be identified from only a polynomial quantity of data. If concerned with approximation results, they become Pac-learnable if the L ∞ norm is used. We also investigate queries that are sufficient for the class to be learnable.
    corecore